<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article# " lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>6.Utilizing Parallel Python | 绿萝间</title>
<link href="../assets/css/all-nocdn.css" rel="stylesheet" type="text/css">
<link href="../assets/css/ipython.min.css" rel="stylesheet" type="text/css">
<link href="../assets/css/nikola_ipython.css" rel="stylesheet" type="text/css">
<meta name="theme-color" content="#5670d4">
<meta name="generator" content="Nikola (getnikola.com)">
<link rel="alternate" type="application/rss+xml" title="RSS" href="../rss.xml">
<link rel="canonical" href="https://muxuezi.github.io/posts/6utilizing-parallel-python.html">
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script><!--[if lt IE 9]><script src="../assets/js/html5.js"></script><![endif]--><meta name="author" content="Tao Junjie">
<link rel="prev" href="slott-2015115-2015-10-01-report.html" title="双色球2015115期(2015-10-01)数据分析报告" type="text/html">
<link rel="next" href="dlott-15115-2015-10-03-report.html" title="大乐透15115期(2015-10-03)数据分析报告" type="text/html">
<meta property="og:site_name" content="绿萝间">
<meta property="og:title" content="6.Utilizing Parallel Python">
<meta property="og:url" content="https://muxuezi.github.io/posts/6utilizing-parallel-python.html">
<meta property="og:description" content="用Parallel Python模块¶上一章我们用multiprocessing和ProcessPoolExecutor模块演示了两个例子。这一章我们将介绍命名队列（named pipe）的用法，以及如何用 Parallel Python (PP)模块的进程解决问题。
本章内容包括以下主题：

理解进程间通信概念
介绍PP模块
用PP在SMP架上计算Fibonacci数列
用PP实现并行网络爬虫
">
<meta property="og:type" content="article">
<meta property="article:published_time" content="2015-10-03T12:44:35+08:00">
<meta property="article:tag" content="Parallel Programming with Python">
<meta property="article:tag" content="Python">
</head>
<body>
<a href="#content" class="sr-only sr-only-focusable">Skip to main content</a>

<!-- Menubar -->

<nav class="navbar navbar-inverse navbar-static-top"><div class="container">
<!-- This keeps the margins nice -->
        <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-navbar" aria-controls="bs-navbar" aria-expanded="false">
            <span class="sr-only">Toggle navigation</span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            </button>
            <a class="navbar-brand" href="https://muxuezi.github.io/">

                <span id="blog-title">绿萝间</span>
            </a>
        </div>
<!-- /.navbar-header -->
        <div class="collapse navbar-collapse" id="bs-navbar" aria-expanded="false">
            <ul class="nav navbar-nav">
<li>
<a href="../archive.html">Archive</a>
                </li>
<li>
<a href="../categories/">Tags</a>
                </li>
<li>
<a href="../rss.xml">RSS feed</a>

                
            </li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
    <a href="6utilizing-parallel-python.ipynb" id="sourcelink">Source</a>
    </li>

                
            </ul>
</div>
<!-- /.navbar-collapse -->
    </div>
<!-- /.container -->
</nav><!-- End of Menubar --><div class="container" id="content" role="main">
    <div class="body-content">
        <!--Body content-->
        <div class="row">
            
            
<article class="post-text h-entry hentry postpage" itemscope="itemscope" itemtype="http://schema.org/Article"><header><h1 class="p-name entry-title" itemprop="headline name"><a href="#" class="u-url">6.Utilizing Parallel Python</a></h1>

        <div class="metadata">
            <p class="byline author vcard"><span class="byline-name fn">
                    Tao Junjie
            </span></p>
            <p class="dateline"><a href="#" rel="bookmark"><time class="published dt-published" datetime="2015-10-03T12:44:35+08:00" itemprop="datePublished" title="2015-10-03 12:44">2015-10-03 12:44</time></a></p>
            
        <p class="sourceline"><a href="6utilizing-parallel-python.ipynb" id="sourcelink">Source</a></p>

        </div>
        

    </header><div class="e-content entry-content" itemprop="articleBody text">
    <div tabindex="-1" id="notebook" class="border-box-sizing">
    <div class="container" id="notebook-container">

<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="用Parallel-Python模块">用Parallel Python模块<a class="anchor-link" href="6utilizing-parallel-python.html#%E7%94%A8Parallel-Python%E6%A8%A1%E5%9D%97">¶</a>
</h2>
<p>上一章我们用<code>multiprocessing</code>和<code>ProcessPoolExecutor</code>模块演示了两个例子。这一章我们将介绍命名队列（named pipe）的用法，以及如何用<strong> Parallel Python (PP)</strong>模块的进程解决问题。</p>
<p>本章内容包括以下主题：</p>
<ul>
<li>理解进程间通信概念</li>
<li>介绍PP模块</li>
<li>用PP在SMP架上计算Fibonacci数列</li>
<li>用PP实现并行网络爬虫</li>
</ul>
<!-- TEASER_END-->
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="理解进程间通信">理解进程间通信<a class="anchor-link" href="6utilizing-parallel-python.html#%E7%90%86%E8%A7%A3%E8%BF%9B%E7%A8%8B%E9%97%B4%E9%80%9A%E4%BF%A1">¶</a>
</h3>
<p>进程间通信（Interprocess communication，IPC）实现了进程之间的信息交换机制。</p>
<p>IPC的实现方式有一些，通常它们都需要根据系统运行时环境选择架构。例如，有时所有进程都运行在同一个机器上，我们可以使用不同的通信方式，比如共享内存，消息队列和管道。如果进程运行在分布式集群环境中，我们可以用套接字和远程过程调用（Remote Procedure Call，RPC）。</p>
<p>在<em>第五章，用multiprocessing和ProcessPoolExecutor模块</em>里面，我们用普通管道实现进程通信。我们还介绍了有共同父进程的进程间通信。但是，有时候无关进程（非相同父进程）之间的通信也有需求。那么，有没有一种方法可以像之前那样利用进程的地址空间实现彼此间的通信呢？然而，进程是不可能直接连接无关进程的地址空间的。因此，我们必须引入一个命名管道（named pipe）的机制。</p>
<h4 id="命名管道简介">命名管道简介<a class="anchor-link" href="6utilizing-parallel-python.html#%E5%91%BD%E5%90%8D%E7%AE%A1%E9%81%93%E7%AE%80%E4%BB%8B">¶</a>
</h4>
<p>在像Linux这样的POSIX系统中，我们知道几乎任何内容都可以当作文件。我们要处理每个任务都可以看成是一个文件，我们还可以用一个文件描述器（file descriptor）来操作文件。</p>
<blockquote>
<p>文件描述器是一种允许用户对文件读/写操作进行编程的机制。通常一个文件对应唯一的文件描述器。具体请查看<a href="http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.genprogc/doc/genprogc/fdescript.htm%23vvnxfc2judy">文件描述器文档</a>。</p>
</blockquote>
<h4 id="用Python的命名管道">用Python的命名管道<a class="anchor-link" href="6utilizing-parallel-python.html#%E7%94%A8Python%E7%9A%84%E5%91%BD%E5%90%8D%E7%AE%A1%E9%81%93">¶</a>
</h4>
<p>在Python用命名管道非常简单，我们将用两个示例实现命名管道的单向通信方式。第一个程序是<code>write_to_named_pipe.py</code>，其功能是在管道里写入22字节的消息，内容是一个进程PID的字符串。第二个程序是<code>read_from_named_pipe.py</code>，可以完成信息读取并显示信息的内容和进程的PID。</p>
<p>运行<code>read_from_named_pipe.py</code>之后，命令行会显示如下结果：</p>

<pre><code>I pid [&lt;The PID of reader process&gt;] received a message =&gt; Hello from pid [the PID of writer process].</code></pre>
<h5 id="写入命名管道">写入命名管道<a class="anchor-link" href="6utilizing-parallel-python.html#%E5%86%99%E5%85%A5%E5%91%BD%E5%90%8D%E7%AE%A1%E9%81%93">¶</a>
</h5>
<p>在Python里面，命名管道通过系统调用实现。我们将在下面对的<code>write_to_named_pipe.py</code>代码进行逐行分析。</p>
<p>首先我们导入系统调用的模块：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>我们在主函数里创建了一个命名管道和一个特殊文件，FIFO，用来存储信息。在第一行代码我们把命名管道名称设置为：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">named_pipe</span> <span class="o">=</span> <span class="s2">"my_pipe"</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>然后，我们验证命名管道是否已经存在。如果不存在我们就用<code>os.mkfifo</code>来创建一个：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">named_pipe</span><span class="p">):</span>
    <span class="n">os</span><span class="o">.</span><span class="n">mkfifo</span><span class="p">(</span><span class="n">named_pipe</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>这里<code>os.mkfifo</code>实现了一个具有FIFO功能的特殊文件，用来向命名管道读写信息。</p>
<p>现在，我们在调用<code>write_message</code>函数传递<code>named_pipe</code>参数和<code>Hello from pid [%d]</code>信息。这个函数将信息写到文件里，这个文件将作为参数被命名管道接收。<code>write_message</code>函数定义如下：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">write_message</span><span class="p">(</span><span class="n">input_pipe</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
    <span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">input_pipe</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">O_WRONLY</span><span class="p">)</span>
    <span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="p">(</span><span class="n">message</span> <span class="o">%</span> <span class="nb">str</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">())))</span>
    <span class="n">os</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>我们观察函数的第一行会看到，这里使用了<code>os.open</code>系统命令，当操作成功后，会返回一个文件描述器，允许我们对FIFO文件里的数据进行读写。我们还可以通过标记<code>flag</code>对FIFO文件的编辑模式进行控制。如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">input_pipe</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">O_WRONLY</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>命名管道成功打开之后，就可以向里面写信息了，我们把进程的PID作为信息写进去：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="p">(</span><span class="n">message</span> <span class="o">%</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()))</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>最后记得用<code>os.close()</code>把通信通道关闭。这样使用的计算机资源就释放了：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">os</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h5 id="读取命名管道">读取命名管道<a class="anchor-link" href="6utilizing-parallel-python.html#%E8%AF%BB%E5%8F%96%E5%91%BD%E5%90%8D%E7%AE%A1%E9%81%93">¶</a>
</h5>
<p>我们用程序<code>read_from_named_pipe.py</code>实现命名管道的信息读取，同样适用<code>os</code>模块进行操作。在主函数里触发进程，过程很简单。首先定义一个命名管道的名称，如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">named_pipe</span> <span class="o">=</span> <span class="s2">"my_pipe"</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>然后，我们调用<code>read_message</code>函数，会读取<code>write_to_named_pipe.py</code>里写入命名管道的信息。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">read_message</span><span class="p">(</span><span class="n">input_type</span><span class="p">):</span>
    <span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">input_pipe</span><span class="p">,</span> <span class="n">os_RONLY</span><span class="p">)</span>
    <span class="n">message</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"I pid [</span><span class="si">%d</span><span class="s2">] received a message =&gt; </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">(),</span> <span class="n">os</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">22</span><span class="p">))</span>
    <span class="n">os</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">message</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><code>os.open</code>和前面用法相同。这里的新用法是<code>os.read</code>，按指定字节读取信息。本例中使用的是22个字节。信息读取之后，函数就会返回信息。最后记得用<code>os.close</code>关闭信道，释放资源。</p>
<blockquote>
<p>文件描述器是否可以打开是需要检验的。开发者可以根据自己需求，对文件描述器和命名管道的相关异常进行处理。</p>
</blockquote>
<p>最后，我们可以看到两个程序的输入结果，如下图所示：
<img src="http://muxuezi.github.io/posts/ppp/ch6/namedpipe.png" alt=""></p>

</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="探索Parallel-Python库">探索Parallel Python库<a class="anchor-link" href="6utilizing-parallel-python.html#%E6%8E%A2%E7%B4%A2Parallel-Python%E5%BA%93">¶</a>
</h3>
<p>前面的例子直接利用系统调用用一种底层机制实现了进程间通信。这在Linux和Unix环境下处理进程间通信的必然手段。现在，我们将用一个Python模块PP来建立IPC，不仅是同一个机器上的进程，还包括分布式计算网络中的不同机器IPC。</p>
<p>PP模块文档不太丰富，可以在其<a href="http://www.parallelpython.com/component/option,com_smf/">官网的FAQ</a>里查看信息。API中介绍了许多关于此模块具体用法。</p>
<p>用PP最大的优势是模块提供了高层的抽象。主要特性如下所示：</p>
<ul>
<li>自动发现进程数量实现负载均衡</li>
<li>在运行阶段可以分配处理器</li>
<li>运行阶段可以负载均衡</li>
<li>可以通过网络自动寻找资源</li>
</ul>
<p>PP模块通过两种方式实现了并行。第一种方式是在一个机器上有多CPU或多核心时，利用SMP架构。第二种方式是通过网络把任务分配到各个机器中，形成云计算模式。这两种情形，进程间的信息交换通过调用高度抽象函数实现，这样我们就不用担心管道和套接字的底层细节了。通过参数和函数就可以简单地实现交换信息，具体在下面的示例中介绍。</p>
<p>在PP模块里有一个<code>Server</code>类，我们可以用它来封装和发放本地与远程进程间的任务。在初始化时（<code>__init__</code>）有一些重要的参数需要注意，如下所示：</p>
<ul>
<li>
<code>ncpus</code>：这个参数允许我们设置worker进程的数量。如果这个值没设置，默认就会查看本机CPU/核心数量，然后创建对应数量的worker进程执行任务。</li>
<li>
<code>ppservers</code>：这个参数是一个包含机器名称或IP地址的元组，并行Python执行服务器（Parallel Python Execution Servers，PPES）。PPES由网络中具有<code>ppservers.py</code>功能的机器构成，运行并等待任务执行。相关的参数信息请见<a href="http://www.parallelpython.com/content/view/15/30/">文档</a>。</li>
</ul>
<p><code>Server</code>类的实例有一些方法，<code>submit</code>方法可以向目标机器发放任务。<code>submit</code>函数签名如下所示：</p>

<pre><code>submit(self, func, args=(), depfuncs=(), modules=(),
    callback=None, callbackargs=(), group='default',
        globals=None)</code></pre>
<p><code>submit</code>方法主要参数介绍如下：</p>
<ul>
<li>
<code>func</code>：本机CPU或远程服务器要执行的函数</li>
<li>
<code>args</code>：<code>func</code>函数的参数</li>
<li>
<code>modules</code>：函数执行需要导入的远程代码或<code>func</code>函数执行需要导入的进程。例如，如果任务分配函数用了<code>time</code>模块，那么参数就要设置为<code>modules=('time', )</code>
</li>
<li>
<code>callback</code>：这是我们后面要用的回调函数。当<code>func</code>参数的函数获取进程结果时，回调函数就是对结果进行处理。</li>
</ul>
<p>其他参数将在后面的内容里进一步介绍。</p>

</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="用PP在SMP架构实现多输入的Fibonacci数列">用PP在SMP架构实现多输入的Fibonacci数列<a class="anchor-link" href="6utilizing-parallel-python.html#%E7%94%A8PP%E5%9C%A8SMP%E6%9E%B6%E6%9E%84%E5%AE%9E%E7%8E%B0%E5%A4%9A%E8%BE%93%E5%85%A5%E7%9A%84Fibonacci%E6%95%B0%E5%88%97">¶</a>
</h3>
<p>现在让我们开始动手吧！让我们用PP模块在SMP上架构实现多输入的Fibonacci数列。我将用一个双核四线程的笔记本来运行程序。</p>
<p>这里需要导入的模块只有两个<code>os</code>和<code>pp</code>，<code>os</code>仅用来获取进程的PID。定义一个<code>input_list</code>模拟多个输入，一个<code>result_dict</code>字段存放最终结果。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">pp</span>

<span class="n">input_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">10</span><span class="p">]</span>
<span class="n">result_dict</span> <span class="o">=</span> <span class="p">{}</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>之后，我们定义一个函数<code>fibo_task</code>来并行执行进程。它将作为<code>Server</code>类里<code>submit</code>方法的<code>func</code>参数。这个函数和上一章的版本没太多变化，唯一不同的时返回值现在是一个元组，封装了两个元素，一个是输入参数，另一个是包含进程PID和Fibonacci计算值的字符串。函数定义如下：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">fibo_task</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
    <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
    <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
        <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
    <span class="n">message</span> <span class="o">=</span> <span class="s2">"the fibonacci calculated by pid </span><span class="si">%d</span><span class="s2"> was </span><span class="si">%d</span><span class="s2">"</span> \
        <span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">(),</span> <span class="n">a</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>下一步是定义<code>callback</code>回调函数，我们定义成<code>aggregate_results</code>。这个回调函数会在<code>fibo_task</code>完成任务时执行。其实现非常简单，就是显示进程运行的状态信息，把<code>fibo_task</code>运行的结果作为输入对应的值写入字典<code>result_dict</code>里。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">aggregate_results</span><span class="p">(</span><span class="n">result</span><span class="p">):</span>
    <span class="nb">print</span> <span class="s2">"Computing results with PID [</span><span class="si">%d</span><span class="s2">]"</span> <span class="o">%</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()</span>
    <span class="n">result_dict</span><span class="p">[</span><span class="n">result</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>现在，我们定义了好两个函数，就创建一个<code>Server</code>类的实例来分配任务。</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">job_server</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Server</span><span class="p">()</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>在之前的例子中，我们用的都是数值参数。下面我们将用另一种参数。</p>
<p>有了<code>Server</code>实例，我们就对<code>input_list</code>进行迭代，然后通过<code>submit</code>分配<code>fibo_task</code>任务，把<code>input_list</code>的输入值传入<code>args</code>的元组中，<code>modules</code>参数设置为需要导入的<code>os</code>模块，<code>callback</code>参数设置为<code>aggregate_results</code>。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">input_list</span><span class="p">:</span>
    <span class="n">job_server</span><span class="o">.</span><span class="n">submit</span><span class="p">(</span>
        <span class="n">fibo_task</span><span class="p">,</span> <span class="p">(</span><span class="n">item</span><span class="p">,),</span> <span class="n">modules</span><span class="o">=</span><span class="p">(</span><span class="s1">'os'</span><span class="p">,),</span> <span class="n">callback</span><span class="o">=</span><span class="n">aggregate_results</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>最后，我们需要等待所有被分配的任务运行完毕。调用<code>wait</code>方法即可：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">job_server</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<blockquote>
<p>还有一种方式，不需要用<code>callback</code>函数也可以获取执行函数。<code>submit</code>方法返回一个<code>pp._Task</code>对象类型，里面包含了进程运行完成后的结果。</p>
</blockquote>
<p>通过打印<code>result_dict</code>字典显示结果：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="nb">print</span> <span class="s2">"Main process PID [</span><span class="si">%d</span><span class="s2">]"</span> <span class="o">%</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">result_dict</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
    <span class="nb">print</span> <span class="s2">"For input </span><span class="si">%d</span><span class="s2">, </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>最终结果如下图所示：
<img src="http://muxuezi.github.io/posts/ppp/ch6/fibonacci_pp_smp.png" alt=""></p>
<p><a href="http://muxuezi.github.io/posts/ppp/ch6/fibonacci_pp_smp.py">源代码</a></p>

</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="用PP实现分布式网络爬虫">用PP实现分布式网络爬虫<a class="anchor-link" href="6utilizing-parallel-python.html#%E7%94%A8PP%E5%AE%9E%E7%8E%B0%E5%88%86%E5%B8%83%E5%BC%8F%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB">¶</a>
</h3>
<p>用PP实现了本地进程的任务分配之后，我们再来看看分布式并行的方法。下面我们用三个机器来执行：</p>
<ul>
<li>Iceman-Thinkad-X220: Ubuntu 13.10</li>
<li>Iceman-Q47OC-500P4C: Ubuntu 12.04 LTS</li>
<li>Asgard-desktop: Elementary OS</li>
</ul>
<p>我们将用PP把任务分配到三台电脑上运行。这里还用之前的网络爬虫来演示。代码在<code>web_crawler_pp_cluster.py</code>文件中，我们把待处理的URL放在<code>input_list</code>里面，然后分配一个本地或远程的进程执行任务，最后用<code>callback</code>回调函数把每个URL的抓取的前三个链接保存起来。</p>
<p>下面我们一步步分析解决问题的过程。首先，我们导入必要的模块，定义几个数据结构。和上一节类似，我们新建一个<code>input_list</code>列表存放URL，一个字典<code>result_dict</code>存放最终抓取结果。</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">pp</span>

<span class="n">url_list</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'http://www.google.com/'</span><span class="p">,</span> <span class="s1">'http://gizmodo.uol.com.br/'</span><span class="p">,</span>
            <span class="s1">'https://github.com/'</span><span class="p">,</span> <span class="s1">'http://br.search.yahoo.com/'</span><span class="p">,</span>
            <span class="s1">'http://www.python.org/'</span><span class="p">,</span> <span class="s1">'http://www.python.org/psf/'</span><span class="p">]</span>

<span class="n">result_dict</span> <span class="o">=</span> <span class="p">{}</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>现在，我们定义回调函数<code>aggregate_results</code>，只要把上一节的显示Fibonacci计算结果的回调函数稍作修改就可以了。我们只改变了字典保存信息的组织结构，里面包含进程PID，进程所在电脑的名称，以及抓取的前三个链接。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">aggregate_results</span><span class="p">(</span><span class="n">result</span><span class="p">):</span>
    <span class="nb">print</span> <span class="s2">"Computing results in main process PID [</span><span class="si">%d</span><span class="s2">]"</span> <span class="o">%</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()</span>
    <span class="n">message</span> <span class="o">=</span> <span class="s2">"PID </span><span class="si">%d</span><span class="s2"> in hostname [</span><span class="si">%s</span><span class="s2">] the following links were found: </span><span class="si">%s</span><span class="s2">"</span>\
        <span class="o">%</span> <span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">result</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">result</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
    <span class="n">result_dict</span><span class="p">[</span><span class="n">result</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">=</span> <span class="n">message</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>紧接着，我们定义<code>crawl_task</code>函数，后面分配到<code>Server</code>类的实例中。和上一节的任务函数类似，其目的就是为了从URL对应的页面中抓取所有链接的前三个。唯一不同的是返回元组的结构，如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">crawl_task</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
    <span class="n">html_link_regex</span> <span class="o">=</span> \
        <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'&lt;a\s(?:.*?\s)*?href=[</span><span class="se">\'</span><span class="s1">"](.*?)[</span><span class="se">\'</span><span class="s1">"].*?&gt;'</span><span class="p">)</span>

    <span class="n">request_data</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="c1"># limit to the first 03 links</span>
    <span class="n">links</span> <span class="o">=</span> <span class="n">html_link_regex</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">request_data</span><span class="o">.</span><span class="n">text</span><span class="p">)[:</span><span class="mi">3</span><span class="p">]</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">links</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">(),</span> <span class="n">os</span><span class="o">.</span><span class="n">uname</span><span class="p">()[</span><span class="mi">1</span><span class="p">])</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>在任务函数和回调函数写完之后，我们就应该用<code>Server</code>实例向网络中的电脑分配任务了。我们将在<code>Server</code>类初始化阶段定义一些参数。首先就是网络中准备运行任务的电脑IP地址。在我们的例子中，本机之外的两台电脑IP地址用元组封装成一个<code>ppservers</code>变量：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">ppservers</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"192.168.25.21"</span><span class="p">,</span> <span class="s2">"192.168.25.9"</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<blockquote>
<p>如果你不想用具体的IP地址，或者电脑太多写得麻烦，你可以在<code>ppservers</code>元组中使用<code>*</code>通配符。</p>
</blockquote>
<p>定义了<code>ppservers</code>元组之后，我们创建<code>Server</code>实例：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">job_dispatcher</span> <span class="o">=</span> <span class="n">pp</span><span class="o">.</span><span class="n">Server</span><span class="p">(</span><span class="n">ncpus</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">ppservers</span><span class="o">=</span><span class="n">ppservers</span><span class="p">,</span> <span class="n">socket_timeout</span><span class="o">=</span><span class="mi">60000</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>这里和上一节的设置有些差异。首先，我们把<code>ncpus</code>参数设置成<code>1</code>。这样PP模块在本机上分配任务只用一个进程，其他任务都分配给网络中的另外两台电脑。第二个参数<code>ppservers</code>是之前创建的IP地址元组。最后一个参数<code>socket_timeout</code>是进程运行等待时限（按秒计算），这里设置成60000，是为了演示过程中不会因为长时间未完成而关闭通道。</p>
<p><code>Server</code>实例创建之后，我们来分配任务。用一个循环遍历每个URL，通过<code>Server</code>实例的<code>submit</code>方法把URL分配给每个机器：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">url_list</span><span class="p">:</span>
    <span class="n">job_dispatcher</span><span class="o">.</span><span class="n">submit</span><span class="p">(</span><span class="n">crawl_task</span><span class="p">,</span> <span class="p">(</span><span class="n">url</span><span class="p">,),</span>
                          <span class="n">modules</span><span class="o">=</span><span class="p">(</span><span class="s1">'os'</span><span class="p">,</span> <span class="s1">'re'</span><span class="p">,</span> <span class="s1">'requests'</span><span class="p">,),</span>
                          <span class="n">callback</span><span class="o">=</span><span class="n">aggregate_results</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>这里和前面Fibonacci数列的<code>submit</code>方法最大的不同，就是进程执行需要导入的模块。</p>
<blockquote>
<p>你可能会问为什么PP模块不需要放在<code>modules</code>参数里。其实，PP运行环境已经默认帮我们导入了<code>pp</code>。毕竟，远程节点还是需要的。</p>
</blockquote>
<p>分配完任务，我们就用<code>wait</code>方法等待任务完成。这里使用了<code>Server</code>类的一个方法<code>print_stats</code>，会显示一些有趣的统计信息。代码如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [ ]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">job_dispatcher</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>

<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">result_dict</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
    <span class="nb">print</span> <span class="s2">"** For url </span><span class="si">%s</span><span class="s2">, </span><span class="si">%s</span><span class="se">\n</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>

<span class="nb">print</span> <span class="n">job_dispatcher</span><span class="o">.</span><span class="n">print_stats</span><span class="p">()</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>运行程序之前，我们还需要在远程机器上配置<code>ppserver.py</code>功能，执行<code>ppserver.py -a -d</code>命令即可，其中<code>-a</code>表示自动发现选项，允许服务器发现那些IP地址没有在<code>ppservers</code>元组中设置的客户端机器。<code>-d</code>参数是调试模式，可以显示服务器运行过程中的所有日志文件。</p>
<p>下面让我们看看运行的结果：</p>
<ul>
<li>首先，主节点创建和分配任务，运行结果如下所示。这里会看到里面有一些有趣的统计信息，比如分配到每个节点上的任务数量，完成任务使用的总时间，每个任务使用的平均时间，以及对应节点的IP地址和等待的时限。还有一个有意思的地方是，回调函数只在主节点进程上运行。因此，需要注意的是，不要把回调函数做得太复杂，否则会占用主节点过多的资源。
<img src="http://muxuezi.github.io/posts/ppp/ch6/mainnodes.png" alt="">
</li>
<li>
<p>然后，对两台机器进行<code>ppserver.py</code>初始化并处理任务，如下面截图所示。</p>
<ul>
<li>
<p>在机器<code>iceman-Q47OC-500P4C</code>上运行的结果如下所示：
<img src="http://muxuezi.github.io/posts/ppp/ch6/remotenodes1.png" alt=""></p>
</li>
<li>
<p>在机器<code>asgard-desktop</code>上运行的结果如下所示：
<img src="http://muxuezi.github.io/posts/ppp/ch6/remotenodes2.png" alt=""></p>
</li>
</ul>
</li>
</ul>
<p><a href="http://muxuezi.github.io/posts/ppp/ch6/web_crawler_pp_cluster.py">源代码</a></p>

</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="本章小结">本章小结<a class="anchor-link" href="6utilizing-parallel-python.html#%E6%9C%AC%E7%AB%A0%E5%B0%8F%E7%BB%93">¶</a>
</h3>
<p>本章我们首先通过底层的命名管道实现了IPC，然后使用PP模块演示了两个问题的处理方法，它提供了高层的抽象，让IPC问题和分布式进程处理都更加简单。PP非常适合建立简单、小型、并行、分布式的Python应用。</p>
<p>下一章我们将使用大名鼎鼎的Celery模块来实现并行分布式任务。</p>

</div>
</div>
</div>
    </div>
  </div>

    </div>
    <aside class="postpromonav"><nav><ul itemprop="keywords" class="tags">
<li><a class="tag p-category" href="../categories/parallel-programming-with-python.html" rel="tag">Parallel Programming with Python</a></li>
            <li><a class="tag p-category" href="../categories/python.html" rel="tag">Python</a></li>
        </ul>
<ul class="pager hidden-print">
<li class="previous">
                <a href="slott-2015115-2015-10-01-report.html" rel="prev" title="双色球2015115期(2015-10-01)数据分析报告">Previous post</a>
            </li>
            <li class="next">
                <a href="dlott-15115-2015-10-03-report.html" rel="next" title="大乐透15115期(2015-10-03)数据分析报告">Next post</a>
            </li>
        </ul></nav></aside><script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script><script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script></article>
</div>
        <!--End of body content-->

        <footer id="footer">
            Contents © 2017         <a href="mailto:muxuezi@gmail.com">Tao Junjie</a> - Powered by         <a href="https://getnikola.com" rel="nofollow">Nikola</a>         
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0">
<img alt="Creative Commons License BY-NC-SA" style="border-width:0; margin-bottom:12px;" src="http://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png"></a>
            
        </footer>
</div>
</div>


            <script src="../assets/js/all-nocdn.js"></script><script>$('a.image-reference:not(.islink) img:not(.islink)').parent().colorbox({rel:"gal",maxWidth:"100%",maxHeight:"100%",scalePhotos:true});</script><!-- fancy dates --><script>
    moment.locale("en");
    fancydates(0, "YYYY-MM-DD HH:mm");
    </script><!-- end fancy dates --><script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-51330059-1', 'auto');
  ga('send', 'pageview');

</script>
</body>
</html>
